Skip to content

feat(github): Add cursor-based pagination to integration repos endpoint#112591

Draft
jaydgoss wants to merge 13 commits intomasterfrom
jaygoss/vdy-46-scm-repo-selector-pre-populate-dropdown-with-initial-repos-BE
Draft

feat(github): Add cursor-based pagination to integration repos endpoint#112591
jaydgoss wants to merge 13 commits intomasterfrom
jaygoss/vdy-46-scm-repo-selector-pre-populate-dropdown-with-initial-repos-BE

Conversation

@jaydgoss
Copy link
Copy Markdown
Member

@jaydgoss jaydgoss commented Apr 9, 2026

Summary

  • Add opt-in cursor-based pagination to GET /organizations/{org}/integrations/{id}/repos/, triggered when a caller sends per_page without search
  • Add get_repositories_paginated() to BaseRepositoryIntegration (returns None by default) with a GitHub override that slices the Django-cached repo list
  • Existing consumers that don't send per_page continue to receive the full repo list unchanged

This supports the SCM onboarding repo selector which needs fast page-at-a-time loading for orgs with many GitHub repos.

Stacks on #112327 (VDY-68 caching).

Test plan

  • Existing OrganizationIntegrationReposTest tests pass (no behavior change for current consumers)
  • New OrganizationIntegrationReposPaginatedTest tests verify:
    • First/second/last page pagination with cursor + Link headers
    • Archived repos excluded
    • installableOnly filter works with pagination
    • Single-page results have no Link header
    • Requests without per_page use the full list (existing behavior)
    • Requests with search + per_page use the full list (search bypasses pagination)

Refs VDY-46

@linear-code
Copy link
Copy Markdown

linear-code bot commented Apr 9, 2026


return response

return self.respond({"detail": "Repositories not supported"}, status=400)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth seeing if there's a way to work this into using our generic self.paginate interface. I don't know if it will work well for an api call like this, but could be worth a quick try to see

per_page = max(1, min(int(request.GET["per_page"]), 100))
except (ValueError, TypeError):
per_page = 100
cursor = self._parse_cursor(request)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should already have api methods that handle producing our cursors, even if you're unable to use the built in self.paginate method it might be worth looking into those.

Copy link
Copy Markdown
Member Author

@jaydgoss jaydgoss Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to self.get_per_page() and self.get_cursor_from_request() in the latest push bc5a262.

The full self.paginate() flow doesn't seem to entirely work here since it expects a Django Paginator with a queryset, and we're paginating over an external API with a custom response shape ({"repos": [...], "searchable": bool}).

So the CursorResult construction for Link headers is still manual, but the input parsing now uses the standard helpers.

This is acceptable for infinite-scroll consumers.
"""
client = self.get_client()
all_repos = client.get_repos_cached()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're just fetching all of these anyway, is there all that much value in paginating on the backend? I know we save on serialization + network cost, but we're still pulling all of these items into memory

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, forgot that /installation/repositories already supports page + per_page params and returns total_count (our existing interactions with that endpoint just aggregate all pages).

Going to rework this to just make a single GH API call per page instead of caching the full list and slicing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made that change in bc5a262

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting to think this pagination is not worth the trouble, the filtering we are applying post page fetch (installable_only, not "archived") is a problem.

only thing I can think that would solve that would be to have get_repositories_paginated fetch pages and filter until it assembles a full page of filtered results.

implement pagination (callers should fall back to
``get_repositories()``).
"""
return None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably raise NotImplementedError here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The call site is already in a try/except anyway, so NotImplementedError would just be another clause. Will switch to that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made that change in bc5a262

@jaydgoss jaydgoss marked this pull request as draft April 10, 2026 02:29
@jaydgoss jaydgoss marked this pull request as ready for review April 10, 2026 03:06
Base automatically changed from jaygoss/vdy-68-perfgithub-optimize-accessibleonly-repo-search-to-avoid to master April 10, 2026 16:08
Add a new /repos-paginated/ endpoint that paginates over the cached
accessible repo list for GitHub. Other providers fall back to
returning the full list from get_repositories().

This is purpose-built for the SCM onboarding repo selector dropdown,
where we need fast initial page loads via cursor-based pagination.
Search is intentionally excluded -- the FE uses the existing /repos/
endpoint with accessibleOnly for that.

The pagination interface is provider-agnostic:
- BaseRepositoryIntegration.get_repositories_paginated() returns None
  by default (unsupported)
- GitHubIntegration overrides it to slice the Django-cached repo list
- The endpoint dispatches based on the return value

Refs VDY-46
Add opt-in pagination to the existing /repos/ endpoint, triggered
when a caller sends the per_page query param without a search query.
Existing consumers that don't send per_page continue to receive the
full repo list unchanged.

The pagination interface is provider-agnostic:
- BaseRepositoryIntegration.get_repositories_paginated() returns None
  by default (provider does not support pagination)
- GitHubIntegration overrides it to slice the Django-cached repo list
- The endpoint dispatches based on the return value, falling back to
  get_repositories() when pagination is not supported

This supports the SCM onboarding repo selector, which needs fast
page-at-a-time loading for orgs with many GitHub repos.

Refs VDY-46
Promote the local to_repo_info closure to a method on
GitHubIntegration so both get_repositories and
get_repositories_paginated share the same formatting logic.

Refs VDY-46
Clamp per_page to at least 1 to prevent infinite empty pages
(per_page=0) or incorrect has_next (per_page=-1). Add docstring
to get_repositories_paginated noting it always serves from the
cached accessible-repos list.

Refs VDY-46
Wrap get_repositories_paginated in try/except for IntegrationError
and IdentityNotValid so token revocations return 400 instead of 500.

Validate per_page as an integer (non-numeric input like ?per_page=abc
defaults to 100 instead of raising ValueError).

Clamp cursor offset to >= 0 so crafted cursors like 0:-5:0 cannot
produce unexpected slicing behavior.

Refs VDY-46
Add tests for input validation edge cases: per_page=0, per_page=-1,
per_page=abc, per_page=200, negative cursor offset, and
IntegrationError in the paginated path.

Document that accessibleOnly is ignored in the paginated path and
that CursorResult is only used for Link header generation. Remove
redundant return_value=[] from test patch decorators.

Refs VDY-46
Document cache consistency caveat in get_repositories_paginated
docstring: cache expiry between pages can cause duplicates/skips,
acceptable for infinite-scroll.

Make _parse_cursor a staticmethod. Add comment clarifying per_page
and offset are guaranteed set when paginated is not None.

Refs VDY-46
Fix references to the old get_accessible_repos_cached name in
get_repositories_paginated and pagination tests after rebase.
…icing

Paginated repo browsing now makes a single GitHub API call per page
via GET /installation/repositories?page=N&per_page=M instead of
fetching all repos into a cache and slicing. This avoids loading
the full repo list into memory on every request.

Also switches to the built-in get_per_page() and
get_cursor_from_request() helpers, raises NotImplementedError in the
base class instead of returning None, and removes the custom
_parse_cursor method.

Refs VDY-46
Document that archived repos and installableOnly filtering can cause
pages to have fewer items than per_page and has_next to overestimate.
Add test for NotImplementedError fallback when a provider does not
support paginated browsing.

Refs VDY-46
@jaydgoss jaydgoss force-pushed the jaygoss/vdy-46-scm-repo-selector-pre-populate-dropdown-with-initial-repos-BE branch from e5fc935 to 000570b Compare April 10, 2026 16:09
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

Backend Test Failures

Failures on ec4dd74 in this run:

tests/sentry/integrations/api/endpoints/test_organization_integration_repos.py::OrganizationIntegrationReposPaginatedTest::test_not_implemented_falls_back_to_full_listlog
[gw0] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/integrations/api/endpoints/test_organization_integration_repos.py:546: in test_not_implemented_falls_back_to_full_list
    assert response.status_code == 200, response.content
E   AssertionError: b'{"detail":"Internal Error","errorId":null}'
E   assert 500 == 200
E    +  where 500 = <Response status_code=500, "application/json">.status_code
tests/sentry/integrations/api/endpoints/test_organization_integration_repos.py::OrganizationIntegrationReposPaginatedTest::test_without_per_page_uses_full_listlog
[gw0] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/integrations/api/endpoints/test_organization_integration_repos.py:481: in test_without_per_page_uses_full_list
    assert response.status_code == 200, response.content
E   AssertionError: b'{"detail":"Internal Error","errorId":null}'
E   assert 500 == 200
E    +  where 500 = <Response status_code=500, "application/json">.status_code
tests/sentry/integrations/api/endpoints/test_organization_integration_repos.py::OrganizationIntegrationReposPaginatedTest::test_search_with_per_page_uses_full_listlog
[gw1] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/integrations/api/endpoints/test_organization_integration_repos.py:497: in test_search_with_per_page_uses_full_list
    assert response.status_code == 200, response.content
E   AssertionError: b'{"detail":"Internal Error","errorId":null}'
E   assert 500 == 200
E    +  where 500 = <Response status_code=500, "application/json">.status_code

@jaydgoss jaydgoss marked this pull request as draft April 10, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants